This repository contains the data and code for replicating the paper "Homicide involving Black victims are less likely to be cleared in the United States", fortchoming at *Criminology*. I apologize in advance if the scripts contain typos or are not fully polished. If you find errors, please contact me. On a 8-core machine, the whole code should take between 24 and 36 hours to run. 


*********  DATA UPLOAD AND PREPARATION ********* 


The following datasets are available in the current repository: 

- 'dataset76_20.csv' which contains the MAP data used in the analyses

- 'nibrs_prepared_dataset_91_20.csv' which contains the NIBRS data used in the analyses

- 'nibrs_prepared_dataset_91_20_days.csv' which contains the NIBRS data used in the analyses along with information on the number of days between the event and the arrest of the suspect

- 'dataset_map_equal_ready_with_ses.csv' which contains the MAP-Washington Post linked observations with the same outcome (cleared/uncleared)

- 'dataset_map_discordant_ready_with_ses.csv' which contains all the MAP-Washington Post linked observation, with outcomes (cleared/uncleared) switched when discordant based on information contained in the WP data

- 'homicide_data_geocodio_full.csv' which contains data on socio-economic and demographic indicators for locations of homicides recorded in the WP dataset

- 'states_region.csv', an ancillary dataset with info on each state's US Census region classification

A scholar interested in replicating the results of the paper can use these seven datasets right away. Alternatively, they can download the original versions of MAP, NIBRS and WP data and perform the original filtering and preprocessing carried out to make the datasets ready for analysis. This operation is considerably slower, especially for what concerns the NIBRS. To perform the entire data downloading and cleaning process:

- For the MAP dataset, visit https://www.murderdata.org/p/data-docs.html and download the MAP dataset via the dedicated link in the third paragraph. Then, run the script 'map_preprocessing.R'. Once run, you should get a dataset identical to 'dataset76_20.csv' listed above.

- For the NIBRS dataset, visit Jacob Kaplan's ICPSR page collecting the National Incident-Based Reporting System concatenated files at https://www.openicpsr.org/openicpsr/project/118281/version/V6/view;jsessionid=2701A4ACFE4868A7C202E001C33FA3AD
and download all relevant data. To perform the preprocessing necessary to obtain a dataset identical to 'nibrs_prepared_dataset_91_20' run the following scripts (in the following order):

- 'nibrs_processing.R'
- 'nibrs_dataset_preparation.R'


After running these two, you should get a dataset identical to 'nibrs_prepared_dataset_91_20.csv'

To instead perform the preprocessing necessary to obtain a dataset identical to 'nibrs_prepared_dataset_91_20_days', run the following scripts (in the following order):

- 'nibrs_processing_witharrestdelay.R'
- 'nibrs_dataset_preparation_witharrestdelay.R'.

After running these two, you should get a dataset identical to 'nibrs_prepared_dataset_91_20_days.csv'

- For the WP dataset, the original WaPo data are available at https://github.com/washingtonpost/data-homicides. The pre-processed dataset linked with geocodio variable is "homicide_data_geocodio_full'. Use it by running the following scripts in the following order:
- "wp_preprocessing_ses.R' 
- "robustness_wp_dataset_preparation_ses.R"

All scripts below should be run in R. Some matching/estimation cases can be particularly slow due to the high number of observations.

*********  DESCRIPTIVE EVIDENCE AND MAIN MODELS ********* 

** PLOTS AND MAPS

To replicate the figure in the descriptive evidence subsection as well as the plots in the Supplementary Materials describing the variables used for matching and asjustment in the MAP and NIBRS data, run the "plots_and_maps.R" script.


*** NATIONAL MODELS

To replicate the analyses performed at the national level on the MAP (1991-2020), and NIBRS datasets, run the 'map_and_nibrs_combined_main_models.R' script. This contains all the matching and estimation procedures to obtain the results presented in the main text, including effect heterogeneity per sex category and decade.

To perform sensitivity tests via tipr, use the 'sensitivity_tests.R' script.


******** ROBUSTNESS AND SUPPLEMENTARY MODELS

**** 5 years instead of decades
To replicate the models seen in Table S5 of the SI Appendix, run the script 'ROBUSTNESS_map_and_nibrs_combined_main_models_5y.R'. This is basically very similar to the 'map_and_nibrs_combined_main_models.R', except it does not use decades as temporal matching and control variables, but 5-year time windows.

**** adding victim-offender relationship
To replicate the models seen in Table S6 of the SI Appendix, run the script 'ROBUSTNESS_relationship_map_and_nibrs_combined_main_models.R'. This is basically very similar to the 'map_and_nibrs_combined_main_models.R', except it add info on the recorded relationship between the victim and the offender.


**** MAP (1976-2020)
To replicate the models that use the entire 1976-2020 period for the MAP data, as seen in Table S9, run the script 'ROBUSTNESS_map_1976_2020_models.R'. 


*** RACIAL DISPARITY IN TIME-TO-CLEARANCE
To replicate the models that use the number of days between homicide and arrest (as measured by the NIBRS data) (Table S11), run the script: 'ROBUSTNESS_days_arrest_and_nibrs_main_models.R'.

**** MAP/WASHINGTON POST MODELS
Run the script 'ROBUSTNESS_wp_map_models_with_ses.R' to obtain the results using the linked Washington Post-MAP data (Tables S15 and S16). 
